{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-04T13:44:39.449161Z",
     "start_time": "2019-12-04T13:44:39.443285Z"
    }
   },
   "source": [
    "<style>\n",
    "pre {\n",
    " white-space: pre-wrap !important;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(odd) {\n",
    "    background-color: #f9f9f9;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(even) {\n",
    "    background-color: white;\n",
    "}\n",
    ".table-striped td, .table-striped th, .table-striped tr {\n",
    "    border: 1px solid black;\n",
    "    border-collapse: collapse;\n",
    "    margin: 1em 2em;\n",
    "}\n",
    ".rendered_html td, .rendered_html th {\n",
    "    text-align: left;\n",
    "    vertical-align: middle;\n",
    "    padding: 4px;\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Machine Learning (basic): the Iris dataset\n",
    "\n",
    "If you want to try out this notebook with a live Python kernel, use mybinder:\n",
    "\n",
    "<a class=\"reference external image-reference\" href=\"https://mybinder.org/v2/gh/vaexio/vaex/latest?filepath=docs%2Fsource%2Fexample_ml_iris.ipynb\"><img alt=\"https://mybinder.org/badge_logo.svg\" src=\"https://mybinder.org/badge_logo.svg\" width=\"150px\"></a>\n",
    "\n",
    "While `vaex.ml` does not yet implement predictive models, we provide wrappers to powerful libraries (e.g. [Scikit-learn](https://scikit-learn.org/), [xgboost](https://xgboost.readthedocs.io/)) and make them work efficiently with `vaex`. `vaex.ml` does implement a variety of standard data transformers (e.g. PCA, numerical scalers, categorical encoders) and a very efficient KMeans algorithm that take full advantage of `vaex`.\n",
    "\n",
    "The following is a simple example on use of `vaex.ml`. We will be using the well known Iris dataset, and we will use it to build a model which distinguishes between the three Irish species ([Iris setosa](https://en.wikipedia.org/wiki/Iris_setosa), [Iris virginica](https://en.wikipedia.org/wiki/Iris_virginica) and [Iris versicolor](https://en.wikipedia.org/wiki/Iris_versicolor)).\n",
    "\n",
    "Lets start by importing the common libraries, load and inspect the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T14:31:16.402715Z",
     "start_time": "2020-01-14T14:31:14.887730Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                              </th><th>sepal_length  </th><th>sepal_width  </th><th>petal_length  </th><th>petal_width  </th><th>class_  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>  </td><td>5.9           </td><td>3.0          </td><td>4.2           </td><td>1.5          </td><td>1       </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>  </td><td>6.1           </td><td>3.0          </td><td>4.6           </td><td>1.4          </td><td>1       </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>  </td><td>6.6           </td><td>2.9          </td><td>4.6           </td><td>1.3          </td><td>1       </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>  </td><td>6.7           </td><td>3.3          </td><td>5.7           </td><td>2.1          </td><td>2       </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>  </td><td>5.5           </td><td>4.2          </td><td>1.4           </td><td>0.2          </td><td>0       </td></tr>\n",
       "<tr><td>...                            </td><td>...           </td><td>...          </td><td>...           </td><td>...          </td><td>...     </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>145</i></td><td>5.2           </td><td>3.4          </td><td>1.4           </td><td>0.2          </td><td>0       </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>146</i></td><td>5.1           </td><td>3.8          </td><td>1.6           </td><td>0.2          </td><td>0       </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>147</i></td><td>5.8           </td><td>2.6          </td><td>4.0           </td><td>1.2          </td><td>1       </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>148</i></td><td>5.7           </td><td>3.8          </td><td>1.7           </td><td>0.3          </td><td>0       </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>149</i></td><td>6.2           </td><td>2.9          </td><td>4.3           </td><td>1.3          </td><td>1       </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#    sepal_length    sepal_width    petal_length    petal_width    class_\n",
       "0    5.9             3.0            4.2             1.5            1\n",
       "1    6.1             3.0            4.6             1.4            1\n",
       "2    6.6             2.9            4.6             1.3            1\n",
       "3    6.7             3.3            5.7             2.1            2\n",
       "4    5.5             4.2            1.4             0.2            0\n",
       "...  ...             ...            ...             ...            ...\n",
       "145  5.2             3.4            1.4             0.2            0\n",
       "146  5.1             3.8            1.6             0.2            0\n",
       "147  5.8             2.6            4.0             1.2            1\n",
       "148  5.7             3.8            1.7             0.3            0\n",
       "149  6.2             2.9            4.3             1.3            1"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import vaex\n",
    "import vaex.ml\n",
    "\n",
    "import pylab as plt\n",
    "\n",
    "\n",
    "df = vaex.ml.datasets.load_iris()\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Splitting the data into _train_ and _test_ steps should be done immediately, before any manipulation is done on the data. `vaex.ml` contains a `train_test_split` method which creates shallow copies of the main DataFrame, meaning that no extra memory is used when defining train and test sets. Note that the `train_test_split` method does an ordered split of the main DataFrame to create the two sets. In some cases, one may need to shuffle the data.\n",
    "\n",
    "If shuffling is required, we recommend the following:\n",
    "```\n",
    "df.export(\"shuffled\", shuffle=True)\n",
    "df = vaex.open(\"shuffled.hdf5)\n",
    "df_train, df_test = df.ml.train_test_split(test_size=0.2)\n",
    "```\n",
    "\n",
    "In the present scenario, the dataset is already shuffled, so we can simply do the split right away."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T14:31:16.430056Z",
     "start_time": "2020-01-14T14:31:16.404209Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/jovan/PyLibrary/vaex/packages/vaex-core/vaex/ml/__init__.py:209: UserWarning: Make sure the DataFrame is shuffled\n",
      "  warnings.warn('Make sure the DataFrame is shuffled')\n"
     ]
    }
   ],
   "source": [
    "# Orderd split in train and test\n",
    "df_train, df_test = df.ml.train_test_split(test_size=0.2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As this is a very simple tutorial, we will just use the columns already provided as features for training the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T14:31:17.525079Z",
     "start_time": "2020-01-14T14:31:17.520285Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['sepal_length', 'sepal_width', 'petal_length', 'petal_width']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "features = df_train.column_names[:4]\n",
    "features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-04T14:06:45.512795Z",
     "start_time": "2019-12-04T14:06:45.510575Z"
    }
   },
   "source": [
    "## PCA\n",
    "\n",
    "The `vaex.ml` module contains several classes for dataset transformations that are commonly used to pre-process data prior to building a model. These include numerical feature scalers, category encoders, and [PCA](https://en.wikipedia.org/wiki/Principal_component_analysis) transformations. We have adopted the [scikit-learn](https://scikit-learn.org/stable/) API, meaning that all transformers have the `.fit` and `.transform` methods. \n",
    "\n",
    "Let's use apply a PCA transformation on the training set. There is no need to scale the data beforehand, since the PCA also normalizes the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T14:31:18.975817Z",
     "start_time": "2020-01-14T14:31:18.926724Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                              </th><th>sepal_length  </th><th>sepal_width  </th><th>petal_length  </th><th>petal_width  </th><th>class_  </th><th>PCA_0               </th><th>PCA_1               </th><th>PCA_2               </th><th>PCA_3                 </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>  </td><td>5.4           </td><td>3.0          </td><td>4.5           </td><td>1.5          </td><td>1       </td><td>-0.5819340944906611 </td><td>-0.5192084328455534 </td><td>-0.4079706950207428 </td><td>-0.22843325658378022  </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>  </td><td>4.8           </td><td>3.4          </td><td>1.6           </td><td>0.2          </td><td>0       </td><td>2.628040487885542   </td><td>-0.05578001049524599</td><td>-0.09961452867004605</td><td>-0.14960589756342935  </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>  </td><td>6.9           </td><td>3.1          </td><td>4.9           </td><td>1.5          </td><td>1       </td><td>-1.438496521671396  </td><td>0.5307778852279289  </td><td>0.32322065776316616 </td><td>-0.0066478967991949744</td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>  </td><td>4.4           </td><td>3.2          </td><td>1.3           </td><td>0.2          </td><td>0       </td><td>3.00633586736142    </td><td>-0.41909744036887703</td><td>-0.17571839830952185</td><td>-0.05420541515837107  </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>  </td><td>5.6           </td><td>2.8          </td><td>4.9           </td><td>2.0          </td><td>2       </td><td>-1.1948465297428466 </td><td>-0.6200295372229213 </td><td>-0.4751905348367903 </td><td>0.08724845774327505   </td></tr>\n",
       "<tr><td>...                            </td><td>...           </td><td>...          </td><td>...           </td><td>...          </td><td>...     </td><td>...                 </td><td>...                 </td><td>...                 </td><td>...                   </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>115</i></td><td>5.2           </td><td>3.4          </td><td>1.4           </td><td>0.2          </td><td>0       </td><td>2.6608856211270933  </td><td>0.2619681501203415  </td><td>0.12886483875694454 </td><td>0.06429707648769989   </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>116</i></td><td>5.1           </td><td>3.8          </td><td>1.6           </td><td>0.2          </td><td>0       </td><td>2.561545765055359   </td><td>0.4288927940763031  </td><td>-0.18633294617759266</td><td>-0.20573646329612738  </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>117</i></td><td>5.8           </td><td>2.6          </td><td>4.0           </td><td>1.2          </td><td>1       </td><td>-0.22075578997244774</td><td>-0.40152336651555137</td><td>0.25417836518749715 </td><td>0.04952191889168374   </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>118</i></td><td>5.7           </td><td>3.8          </td><td>1.7           </td><td>0.3          </td><td>0       </td><td>2.23068249078231    </td><td>0.826166758833374   </td><td>0.07863720599424912 </td><td>0.0004035597987264161 </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>119</i></td><td>6.2           </td><td>2.9          </td><td>4.3           </td><td>1.3          </td><td>1       </td><td>-0.6256358184862005 </td><td>0.023930474333675168</td><td>0.21203674475657858 </td><td>-0.0077954052328795265</td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#    sepal_length    sepal_width    petal_length    petal_width    class_    PCA_0                 PCA_1                 PCA_2                 PCA_3\n",
       "0    5.4             3.0            4.5             1.5            1         -0.5819340944906611   -0.5192084328455534   -0.4079706950207428   -0.22843325658378022\n",
       "1    4.8             3.4            1.6             0.2            0         2.628040487885542     -0.05578001049524599  -0.09961452867004605  -0.14960589756342935\n",
       "2    6.9             3.1            4.9             1.5            1         -1.438496521671396    0.5307778852279289    0.32322065776316616   -0.0066478967991949744\n",
       "3    4.4             3.2            1.3             0.2            0         3.00633586736142      -0.41909744036887703  -0.17571839830952185  -0.05420541515837107\n",
       "4    5.6             2.8            4.9             2.0            2         -1.1948465297428466   -0.6200295372229213   -0.4751905348367903   0.08724845774327505\n",
       "...  ...             ...            ...             ...            ...       ...                   ...                   ...                   ...\n",
       "115  5.2             3.4            1.4             0.2            0         2.6608856211270933    0.2619681501203415    0.12886483875694454   0.06429707648769989\n",
       "116  5.1             3.8            1.6             0.2            0         2.561545765055359     0.4288927940763031    -0.18633294617759266  -0.20573646329612738\n",
       "117  5.8             2.6            4.0             1.2            1         -0.22075578997244774  -0.40152336651555137  0.25417836518749715   0.04952191889168374\n",
       "118  5.7             3.8            1.7             0.3            0         2.23068249078231      0.826166758833374     0.07863720599424912   0.0004035597987264161\n",
       "119  6.2             2.9            4.3             1.3            1         -0.6256358184862005   0.023930474333675168  0.21203674475657858   -0.0077954052328795265"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pca = vaex.ml.PCA(features=features, n_components=4)\n",
    "df_train = pca.fit_transform(df_train)\n",
    "df_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result of pca `.fit_transform` method is a shallow copy of the DataFrame which contains the resulting columns of the transformation, in this case the PCA components, as virtual columns. This means that the transformed DataFrame takes no memory at all! So while this example is made with only 120 sample, this would work in the same way even for millions or billions of samples.\n",
    "\n",
    "## Gradient boosting trees\n",
    "\n",
    "Now let's train a gradient boosting model. While `vaex.ml` does not currently include this type of models, we support the popular boosted trees libraries [xgboost](https://xgboost.readthedocs.io/en/latest/), [lightgbm](https://lightgbm.readthedocs.io/en/latest/), and [catboost](https://catboost.ai/). In this tutorial we will use the `lightgbm` classifier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T14:31:40.074344Z",
     "start_time": "2020-01-14T14:31:39.899968Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                              </th><th>sepal_length  </th><th>sepal_width  </th><th>petal_length  </th><th>petal_width  </th><th>class_  </th><th>PCA_0               </th><th>PCA_1               </th><th>PCA_2               </th><th>PCA_3                 </th><th>prediction  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i>  </td><td>5.4           </td><td>3.0          </td><td>4.5           </td><td>1.5          </td><td>1       </td><td>-0.5819340944906611 </td><td>-0.5192084328455534 </td><td>-0.4079706950207428 </td><td>-0.22843325658378022  </td><td>1           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i>  </td><td>4.8           </td><td>3.4          </td><td>1.6           </td><td>0.2          </td><td>0       </td><td>2.628040487885542   </td><td>-0.05578001049524599</td><td>-0.09961452867004605</td><td>-0.14960589756342935  </td><td>0           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i>  </td><td>6.9           </td><td>3.1          </td><td>4.9           </td><td>1.5          </td><td>1       </td><td>-1.438496521671396  </td><td>0.5307778852279289  </td><td>0.32322065776316616 </td><td>-0.0066478967991949744</td><td>1           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i>  </td><td>4.4           </td><td>3.2          </td><td>1.3           </td><td>0.2          </td><td>0       </td><td>3.00633586736142    </td><td>-0.41909744036887703</td><td>-0.17571839830952185</td><td>-0.05420541515837107  </td><td>0           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i>  </td><td>5.6           </td><td>2.8          </td><td>4.9           </td><td>2.0          </td><td>2       </td><td>-1.1948465297428466 </td><td>-0.6200295372229213 </td><td>-0.4751905348367903 </td><td>0.08724845774327505   </td><td>2           </td></tr>\n",
       "<tr><td>...                            </td><td>...           </td><td>...          </td><td>...           </td><td>...          </td><td>...     </td><td>...                 </td><td>...                 </td><td>...                 </td><td>...                   </td><td>...         </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>115</i></td><td>5.2           </td><td>3.4          </td><td>1.4           </td><td>0.2          </td><td>0       </td><td>2.6608856211270933  </td><td>0.2619681501203415  </td><td>0.12886483875694454 </td><td>0.06429707648769989   </td><td>0           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>116</i></td><td>5.1           </td><td>3.8          </td><td>1.6           </td><td>0.2          </td><td>0       </td><td>2.561545765055359   </td><td>0.4288927940763031  </td><td>-0.18633294617759266</td><td>-0.20573646329612738  </td><td>0           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>117</i></td><td>5.8           </td><td>2.6          </td><td>4.0           </td><td>1.2          </td><td>1       </td><td>-0.22075578997244774</td><td>-0.40152336651555137</td><td>0.25417836518749715 </td><td>0.04952191889168374   </td><td>1           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>118</i></td><td>5.7           </td><td>3.8          </td><td>1.7           </td><td>0.3          </td><td>0       </td><td>2.23068249078231    </td><td>0.826166758833374   </td><td>0.07863720599424912 </td><td>0.0004035597987264161 </td><td>0           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>119</i></td><td>6.2           </td><td>2.9          </td><td>4.3           </td><td>1.3          </td><td>1       </td><td>-0.6256358184862005 </td><td>0.023930474333675168</td><td>0.21203674475657858 </td><td>-0.0077954052328795265</td><td>1           </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#    sepal_length    sepal_width    petal_length    petal_width    class_    PCA_0                 PCA_1                 PCA_2                 PCA_3                   prediction\n",
       "0    5.4             3.0            4.5             1.5            1         -0.5819340944906611   -0.5192084328455534   -0.4079706950207428   -0.22843325658378022    1\n",
       "1    4.8             3.4            1.6             0.2            0         2.628040487885542     -0.05578001049524599  -0.09961452867004605  -0.14960589756342935    0\n",
       "2    6.9             3.1            4.9             1.5            1         -1.438496521671396    0.5307778852279289    0.32322065776316616   -0.0066478967991949744  1\n",
       "3    4.4             3.2            1.3             0.2            0         3.00633586736142      -0.41909744036887703  -0.17571839830952185  -0.05420541515837107    0\n",
       "4    5.6             2.8            4.9             2.0            2         -1.1948465297428466   -0.6200295372229213   -0.4751905348367903   0.08724845774327505     2\n",
       "...  ...             ...            ...             ...            ...       ...                   ...                   ...                   ...                     ...\n",
       "115  5.2             3.4            1.4             0.2            0         2.6608856211270933    0.2619681501203415    0.12886483875694454   0.06429707648769989     0\n",
       "116  5.1             3.8            1.6             0.2            0         2.561545765055359     0.4288927940763031    -0.18633294617759266  -0.20573646329612738    0\n",
       "117  5.8             2.6            4.0             1.2            1         -0.22075578997244774  -0.40152336651555137  0.25417836518749715   0.04952191889168374     1\n",
       "118  5.7             3.8            1.7             0.3            0         2.23068249078231      0.826166758833374     0.07863720599424912   0.0004035597987264161   0\n",
       "119  6.2             2.9            4.3             1.3            1         -0.6256358184862005   0.023930474333675168  0.21203674475657858   -0.0077954052328795265  1"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import lightgbm\n",
    "import vaex.ml.sklearn\n",
    "\n",
    "# Features on which to train the model\n",
    "train_features = df_train.get_column_names(regex='PCA_.*')\n",
    "# The target column\n",
    "target = 'class_'\n",
    "\n",
    "# Instantiate the LightGBM Classifier\n",
    "booster = lightgbm.sklearn.LGBMClassifier(num_leaves=5, \n",
    "                                          max_depth=5, \n",
    "                                          n_estimators=100,\n",
    "                                          random_state=42)\n",
    "\n",
    "# Make it a vaex transformer (for the automagic pipeline and lazy predictions)\n",
    "model = vaex.ml.sklearn.SKLearnPredictor(features=train_features, \n",
    "                                         target=target,\n",
    "                                         model=booster, \n",
    "                                         prediction_name='prediction')\n",
    "\n",
    "# Train and predict\n",
    "model.fit(df=df_train)\n",
    "df_train = model.transform(df=df_train)\n",
    "\n",
    "df_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-04T14:40:52.347957Z",
     "start_time": "2019-12-04T14:40:52.345452Z"
    }
   },
   "source": [
    "Notice that after training the model, we use the `.transform` method to obtain a shallow copy of the DataFrame which  contains the prediction of the model, in a form of a virtual column. This makes it easy to evaluate the model, and easily create various diagnostic plots. If required, one can call the `.predict` method, which will result in an in-memory `numpy.array` housing the predictions.\n",
    "\n",
    "## Automatic pipelines\n",
    "\n",
    "Assuming we are happy with the performance of the model, we can continue and apply our transformations and model to the test set. Unlike other libraries, we do not need to explicitly create a pipeline here in order to propagate the transformations. In fact, with `vaex` and `vaex.ml`, a pipeline is automatically being created as one is doing the exploration of the data. Each `vaex` DataFrame contains a _state,_ which is a (serializable) object containing information of all transformations applied to the DataFrame (filtering, creation of new virtual columns, transformations).\n",
    "\n",
    "Recall that the outputs of both the PCA transformation and the boosted model were in fact virtual columns, and thus are stored in the state of `df_train`. All we need to do, is to apply this state to another similar DataFrame (e.g. the test set), and all the changes will be propagated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T14:31:24.204646Z",
     "start_time": "2020-01-14T14:31:24.114304Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "<thead>\n",
       "<tr><th>#                             </th><th>sepal_length  </th><th>sepal_width  </th><th>petal_length  </th><th>petal_width  </th><th>class_  </th><th>PCA_0               </th><th>PCA_1               </th><th>PCA_2               </th><th>PCA_3                </th><th>prediction  </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td><i style='opacity: 0.6'>0</i> </td><td>5.9           </td><td>3.0          </td><td>4.2           </td><td>1.5          </td><td>1       </td><td>-0.4978687101343986 </td><td>-0.11289245880584761</td><td>-0.11962601206069637</td><td>0.0625954090178564   </td><td>1           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>1</i> </td><td>6.1           </td><td>3.0          </td><td>4.6           </td><td>1.4          </td><td>1       </td><td>-0.8754765898560835 </td><td>-0.03902402119573594</td><td>0.022944044447894815</td><td>-0.14143773065379384 </td><td>1           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>2</i> </td><td>6.6           </td><td>2.9          </td><td>4.6           </td><td>1.3          </td><td>1       </td><td>-1.0228803632878913 </td><td>0.2503709022470443  </td><td>0.4130613754204865  </td><td>-0.030391911559003282</td><td>1           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>3</i> </td><td>6.7           </td><td>3.3          </td><td>5.7           </td><td>2.1          </td><td>2       </td><td>-2.2544508624315838 </td><td>0.3431374410700749  </td><td>-0.28908707579214765</td><td>-0.07059175451207655 </td><td>2           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>4</i> </td><td>5.5           </td><td>4.2          </td><td>1.4           </td><td>0.2          </td><td>0       </td><td>2.632289228948536   </td><td>1.020394958612415   </td><td>-0.20769510079946696</td><td>-0.13744144140286718 </td><td>0           </td></tr>\n",
       "<tr><td>...                           </td><td>...           </td><td>...          </td><td>...           </td><td>...          </td><td>...     </td><td>...                 </td><td>...                 </td><td>...                 </td><td>...                  </td><td>...         </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>25</i></td><td>5.5           </td><td>2.5          </td><td>4.0           </td><td>1.3          </td><td>1       </td><td>-0.16189655085432594</td><td>-0.6871827581512436 </td><td>0.09773053160021669 </td><td>0.07093166682594204  </td><td>1           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>26</i></td><td>5.8           </td><td>2.7          </td><td>3.9           </td><td>1.2          </td><td>1       </td><td>-0.12526327170089271</td><td>-0.3148233189949767 </td><td>0.19720893202789733 </td><td>0.060419826927667064 </td><td>1           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>27</i></td><td>4.4           </td><td>2.9          </td><td>1.4           </td><td>0.2          </td><td>0       </td><td>2.8918941837640526  </td><td>-0.6426744898497139 </td><td>0.006171795874510444</td><td>0.007700652884580328 </td><td>0           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>28</i></td><td>4.5           </td><td>2.3          </td><td>1.3           </td><td>0.3          </td><td>0       </td><td>2.850207707200544   </td><td>-0.9710397723109179 </td><td>0.38501428492268475 </td><td>0.377723418991853    </td><td>0           </td></tr>\n",
       "<tr><td><i style='opacity: 0.6'>29</i></td><td>6.9           </td><td>3.2          </td><td>5.7           </td><td>2.3          </td><td>2       </td><td>-2.405639277483925  </td><td>0.4027072938482219  </td><td>-0.22944817803540973</td><td>0.17443211711742812  </td><td>2           </td></tr>\n",
       "</tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "#    sepal_length    sepal_width    petal_length    petal_width    class_    PCA_0                 PCA_1                 PCA_2                 PCA_3                  prediction\n",
       "0    5.9             3.0            4.2             1.5            1         -0.4978687101343986   -0.11289245880584761  -0.11962601206069637  0.0625954090178564     1\n",
       "1    6.1             3.0            4.6             1.4            1         -0.8754765898560835   -0.03902402119573594  0.022944044447894815  -0.14143773065379384   1\n",
       "2    6.6             2.9            4.6             1.3            1         -1.0228803632878913   0.2503709022470443    0.4130613754204865    -0.030391911559003282  1\n",
       "3    6.7             3.3            5.7             2.1            2         -2.2544508624315838   0.3431374410700749    -0.28908707579214765  -0.07059175451207655   2\n",
       "4    5.5             4.2            1.4             0.2            0         2.632289228948536     1.020394958612415     -0.20769510079946696  -0.13744144140286718   0\n",
       "...  ...             ...            ...             ...            ...       ...                   ...                   ...                   ...                    ...\n",
       "25   5.5             2.5            4.0             1.3            1         -0.16189655085432594  -0.6871827581512436   0.09773053160021669   0.07093166682594204    1\n",
       "26   5.8             2.7            3.9             1.2            1         -0.12526327170089271  -0.3148233189949767   0.19720893202789733   0.060419826927667064   1\n",
       "27   4.4             2.9            1.4             0.2            0         2.8918941837640526    -0.6426744898497139   0.006171795874510444  0.007700652884580328   0\n",
       "28   4.5             2.3            1.3             0.3            0         2.850207707200544     -0.9710397723109179   0.38501428492268475   0.377723418991853      0\n",
       "29   6.9             3.2            5.7             2.3            2         -2.405639277483925    0.4027072938482219    -0.22944817803540973  0.17443211711742812    2"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "state = df_train.state_get()\n",
    "df_test.state_set(state)\n",
    "\n",
    "df_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Production\n",
    "\n",
    "Now `df_test` contains all the transformations we applied on the training set (`df_train`), including the model prediction. The transfer of state from one DataFrame to another can be extremely valuable for putting models in production.\n",
    "\n",
    "## Performance\n",
    "Finally, let's check the model performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T14:31:27.603150Z",
     "start_time": "2020-01-14T14:31:27.590066Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Test set accuracy: 100.0%\n"
     ]
    }
   ],
   "source": [
    "from sklearn.metrics import accuracy_score\n",
    "\n",
    "acc = accuracy_score(y_true=df_test.class_.values, y_pred=df_test.prediction.values)\n",
    "acc *= 100.\n",
    "print(f'Test set accuracy: {acc}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-04T15:03:26.881910Z",
     "start_time": "2019-12-04T15:03:26.872187Z"
    }
   },
   "source": [
    "The model get perfect accuracy of 100%. This is not surprising as this problem is rather easy: doing a PCA transformation on the features nicely separates the 3 flower species. Plotting the first two PCA axes, and colouring the samples according to their class already shows an almost perfect separation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-14T14:31:28.960052Z",
     "start_time": "2020-01-14T14:31:28.793818Z"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfsAAAEHCAYAAACp2++wAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3deZgU5bn+8e/Ty2zsm8gmoKIBFRURl8QlbkGSuEbFmGiiifGoOSYnm9EcYxbPSU78mRw10WOiMYtrNCgagkSN0agoo0FEBEGMgiAgywzM2svz+6NbMsx0DzPMdFdPzf25rrnsrqqpuguhn6633npfc3dEREQkvCJBBxAREZHCUrEXEREJORV7ERGRkFOxFxERCTkVexERkZCLBR2gEIYOHerjxo0LOoaIiEjRvPTSS++7+7Bc60JZ7MeNG0d1dXXQMURERIrGzN7Ot07N+CIiIiGnYi8iIhJyKvYiIiIhp2IvIiIScir2IiIiIRfK3vjdyd0h9Sakt0JsAhbpG3QkERGRTlGxb4cnXsO3fBVS68Ci4Am86hys35WY6Y9ORER6BlWsPDy1Bt/0GfC67ILsivr7cU9iA64NKpqIiEin6J59Hl53J3hzjjWN0PAgnt5S7EgiIiK7RMU+n+bngETudRaHxGtFjSMiIrKrVOzzsfY64vlO1ouIiJQOFfs8rGomUJlvJcQPKGoeERGRXaVin0/FxyE+mR0LfhSowAZcj5n+6EREpGdQb/w8zOIw+A5omIXX3wXpWig7FOt7MRbbO+h4IiIiHaZi3w6zOFSdjVWdHXQUERGRXaa2aBERkZBTsRcREQk5FXsREZGQU7EXEREJORV7ERGRkFOxFxERCTkVexERCa3mpgSrlr3Lpvc2Bx0lUHrOXkREQiedTnPXdQ/yh+sfAXeSiRR7HzSOb9x5GWP2HRV0vKLTlb2IiITO7Vfdzf0/fpiGrQ00bGsk0ZRg6Ysr+Pcjr2bzut43RXngxd7M7jCz9Wa2OM96M7MbzWyFmS0ysynFzigiIj1HXU0dD904h8b6ph2WuztN9c08dNOfA0oWnMCLPXAnML2d9ScDE7I/FwO3FCGTiIj0UEtfXEG8PJ5zXaIpwfxHXypyouAFXuzd/WlgUzubnAr81jPmAwPNbERx0omISE9TXlmGu+dfX1VexDSlIfBi3wGjgFUt3q/OLhMREWlj4uH7EI1Fc66rqCpnxheOL3Ki4PWEYm85lrX5ymZmF5tZtZlVb9iwoQixRESkFEVjUb5++6WUV5btsLyssow9Jo7iuPOOCihZcHpCsV8NjGnxfjSwpvVG7n6bu09196nDhg0rWjgRESk9R556KP/zxHc5dPrB9Bvcl93HDeOz15zFDU9/n7I89/PDrCc8Zz8buNzM7gUOA2rcfW3AmUREpMRNOnwf/mvOVUHHKAmBF3szuwc4FhhqZquB7wJxAHe/FZgDzABWAPXA54NJKiIi0jMFXuzd/dydrHfgsiLFERERCZ2ecM9eREREukDFXkREJORU7EVEREJOxV5ERCTkVOxFRERCTsVeREQk5FTsRUREQk7FXkREJORU7EVEREJOxV5ERCTkVOxFRERCTsVeREQk5FTsRUREQk7FXkREJORU7EVEREJOxV5ERCTkVOxFRERCTsVeREQk5FTsRUREQk7FXkREJORU7EVEREJOxV5ERCTkVOxFRERCTsVeREQk5FTsRUREQk7FXkREJORU7EVEREJOxV5ERCTkVOxFRERCTsVeREQk5FTsRUREQk7FXkREJORU7EVEREJOxV5ERCTkVOxFRERCTsVeREQk5AIv9mY23cyWmdkKM7syx/pjzazGzBZmf64JIqeIiEhPFQvy4GYWBX4OnAisBhaY2Wx3X9Jq02fc/RNFDygiIhICQV/ZTwNWuPtKd28G7gVODTiTiIhIqARd7EcBq1q8X51d1toRZvaKmf3ZzPYrTjQREZFwCLQZH7Acy7zV+5eBse6+zcxmAA8BE9rsyOxi4GKAPfbYo7tzioiI9FhBX9mvBsa0eD8aWNNyA3evdfdt2ddzgLiZDW29I3e/zd2nuvvUYcOGFTJzSXB3vHkBXvc7vHEu7k1BRxIRkRIV9JX9AmCCmY0H3gVmAp9uuYGZ7Q6sc3c3s2lkvqBsLHrSEuKpdfimz0F6LXgKLAYYDLwJK/9w0PF6lSUb1nPjC8/z4prVVMXinL3fAVx48CH0LSsLOpqIyHaBFnt3T5rZ5cBjQBS4w91fM7NLsutvBT4F/JuZJYEGYKa7t27q7zXcHd/8BUj9E0hlF2au6n3zpTBsLhYdEVi+3uS5Ve/wxUdm0ZhM4sAWGrml+gUefWMps845jz4q+CJSIoK+sv+gaX5Oq2W3tnh9M3BzsXOVrOSrkHyH7YV+Bym8/m6s39e65VCeXIHX/RoSr0N0FNbnAqxsarfsu6dzd77xl7k0JJM7LG9KpVhVW8vdr77CFw85NKB0IiI7CvqevXRWcgVYrn6NAM2QWNwth0k3zMXfPwMa/gjJxdA0D990EemtN3bL/nu6NzZtpKapMee6plSS+5d0z/8HEZHuoGLf00SGk/shBoAoREd3+RCe3gY13wQa+VcLggMNUPcrPLG8y8fo6RqTSSJ5v3RBU6srfhGRIKnY9zRlh4NV5FkZx6o+nWddJzQ9AZbvr0YCb/hD14/Rw31oyFDydR2JRSIcPXZ8kROJiOSnYt/DmEWxQbeB9QU+KPqxzOt+V2DxiV0/SLoGPN+VaQrSG7p+jB6uPBbj8kMPpzLWtttLeTTGl3S/XkRKSOAd9KTzLH4ADHsSr38QEq9AdARWdRYW27t7DhDfDyzadngjAKogvmMh88QbmRyR/lB+DJa35SFcLj7kUGKRCDctmE8ynSaVTrP34CH8zwkfY8yAAZ3e3+aGBn5R/QKzli6hOZXi0JGj+OrhH2b/3YYXIL2I9CYWxqfYpk6d6tXV1UHH6LHcHd94WqYzIIkWawxsADbsSSzSF09vwzf/W6bQY9mmf8cG/ASrODGY8AFIpFKsqq2hKh5n9779dmkfWxob+MTdv2NDfR2JdBrI9MyoiMW4/ZQzOHz0mPZ3ICK9npm95O45H5lSM760YWbY4F9DfDJQkbllYFUQ3QMbcjcW6QuA13wNEv8g05GvAbwOvB7f8vXM1X4vEY9G2XPQ4F0u9AC/erma9xvqtxd6yHaJTCb59hPz8vYPEBHpCDXjS04WGYwNuQdProTkSogOh9j+WLYHuqfWQtNzQHOO327G62/HBvy4qJl7soeWvk5zKtfYCbCubhvv1NQwduDAIqcSkbBQsZd2WWxPiO3ZdkVyBVjZ9tH7dpSCZj1n3hnN6dyFHiBilveLgIhIR6gZX3ZNZBjQzrPkUXUq64xjxo7L+9x+WTTK+EGDipxIRMJExV52TWxfiOxO7gF+KrE+5xc7UY92WZ7H+CpiMb555FHEIvqnKiK7Tp8gskvMDBv0c7D+QOUHS8EqofJ0KDsmyHg9zriBg7jvUzOZvNvulEWiVMRiDK2q4vvHHs/M/ScHHU9Eejg9eidd4ulavP4BaJ4P0SFY5VlY2ZSgY5WsRCpFfSJBv/LyvM32G+vraUwlGdG3X7tD8oqItNTeo3fqoCddYpH+WN8LgQuDjlLStjQ28IOnn2LO8mWk3elXVs6/TZ3GhQcfsv0Jhw8MqaoKKKWIhJWKvUiBNSWTnHH/3bxbW7v9OfpNjQ3cMP9Z3tu2jauPPjbYgCISerpnL1Jgjy5fxvptdTsMmAOZAXN+/+pC3q+vDyiZiPQWXSr2Zvah7goiPY97Ak/XaXS3nXj0jaXUJxM518UiEZ5b9XaRE4lIb9PVZvx5wB7dEUR6Dk9twGuvg6a/AGmI7Ib3/SqRqtOCjlaS2n9szojmnU5YRArN3Vn418U8eus8Nq2rYfJRE/nkpR9j6MjBQUfrVjst9mZ2Y75VgMbv7GU8XYtvPB3SG4HsqG7ptVD7XdK+mUifzwearxSdtu8knl+9ivpE26v7ZDrFR/YYG0AqEXF3fnbJ//Hk3X+nsS4zGuiyF1cw66Y5/HjeNUw8bELACbtPRy4pPg8sBl5q9VNN7oHRJcS8/p7MfPe0Hr61Abb9DPeGIGKVtJP22pu9Bg2mPBrdYXllLMaXpx3BgIreMSWwSLHVvF/L26+vprE+17DeUD3vlR0KPUCiKUHD1ka+d+b1pFv1s+nJOtKMvwBY7O7PtV5hZtd2eyIpbY1/BnL/w4EoNC+E8iOKmajkxaNR7j3zHG5eMJ97Xl1EbXMT4wYM5IrDj+ST+6jbi0h327h2Mz/53M0senoJsbIY6VSa6Rcex5euP594WXz7drN/MXeHQt9S/dZ6Xp+/nP2O3LdYsQuqI8X+U2TmMG3D3cd3bxwpfTtrDNL951wq43G+ceRRfOPIo4KOIhJqjfVNfPnwb7Np7WZSyTSJpswcHnNvf5LN62r4z/v+Y/u2m9/bknc/ZkbNhtqC5y2WnX4yu/smd9/ps0Fm9mD3RJKSVnkKkK/Z2aHs4GKmERHZwVP3PsvWTdtIJXdsgm9qaGb+I9WsefO97csmHbEvsXi09S4ASDYnGT85PP3Pu/MyLMc8qBI2VvkpiA4D4q3WVEC/KzErCyKWiAgAzz9Snbdp3iIRFv1tyfb3p18xg1hZ2wbusvI4Bx23PyPGh2f2zu4s9nrYuhewSF9syINQ+anMpDcYxCZgA28gUnVO0PFEpJer6JO/w2skYpRV/uuCZMT44fxg9pX0H9KPqv6VVPWrpKwizuRjJ3H1PV8tRtyi0XC50mkWGYgN+B4M+B7u3mZsdxGRoJx0wTE89/CLOa/uU8k0h83Y8VbjQR/dn/vX/pJXnnqN2o1bmXDInozae0Sx4hZNdxZ7feL3QkEXevdGSG+ByGDdQhARppwwmYOOO4B/PPEqTS0euSuvKucLPzqPPgP6tPmdaCzKlBPCPZX0Lhd7MxsDzHT3n2QXfat7IonsnKfr8a3XQcNsMt8zDa88E+v/LczKg44nIgExM67949eZ88snmHXjn9iyvpax+43hM985k0NOPDDoeIHp1Hz2ZjYUOAs4FxgFzHL3rxco2y7TfPbh5u74prMh8To7jutUDmVTiAz+TVDRREQC06X57M2sH3A68GlgH2AWsKe7j+7WlCId1fw8JJfTdgDHJmheiCcWYfFwN8mJiHRGR3rjrwcuAq4D9nL3r6FhciVA3vQM5B36oQma/l7UPCIinVFXU0fN+7VFnTG0I/fsrwJmArcAd5vZfYWNJLITFiPzPTXXuNUR2o4BICISvDdf+Sc3XvpL3qh+E8wYNnoIX7r+fD582rSCH7sjI+j91N0PA04h0xPqIWCkmX3LzPYpdECR1qxiOpCv530UKk4sZhwRkZ1atexdvnrUf7Lk+TdIJlIkm5OsXbmO//7M//LMg/MLfvwOD6rj7ivd/Tp3PwA4FBgA/LlgyUTysPh+UPExoLLVmkqo/BQWGxdAKhGR/H577f07PAr4gab6Zm75jzsL3qS/02JvZnub2YdbLnP3V4G5wMcKFUykPTbgx9DvSoiOAcohOh76X4P1vyboaCIibSx4bCHpdO6CXvP+Vjaser+gx+/IPfufkblv31o98FPgk92aSKQDzCJYn3Ohz7lBRxER2aloNPeEOwCedqLxwg5o25Fm/HHuvqj1QnevBsZ1NYCZTTezZWa2wsyuzLHezOzG7PpFZjalq8cUEREppmPOPoJoLHfBH7n37gwZMaigx+9Isc8/q0Dbm6adYmZR4OfAycAk4Fwzm9Rqs5OBCdmfi8k8FSAiItJjfPrqM+k7sA/R2I5lt7yqjCt+8cWCH78jxX6BmbVJYmYXAS918fjTgBXZzn/NwL3Aqa22ORX4rWfMBwaaWfhmKRDJqmtuZkNdHal0rkcLRaQnGjpyMLf+43848fxjts+uN/WkA7nhb9/ngKMmFvz4HblJ8BVglpmdx7+K+1Qyzz6d3sXjjwJWtXi/GjisA9uMAta23MjMLiZz5c8ee+zRxVgixbdmay3fefJxnl31NhGLUBWPcfmhh/O5g6YEPuGQiHTd0FFD+NqvLuVrv7q06MfeabF393XAkWb2UWD/7OI/ufuT3XD8XJ9grbsrdmQb3P024DbIjI3f9WgixVPT2Mhp993FpoYG0u5AmqZUkuuf/ztbmpr46uFHBh1RRHqwjjx6V2FmXwHOJDNM7i3dVOghc5U+psX70cCaXdhGpEe7e/ErbG1qyhb6f2lIJvnlywvY2tT2+VwRkY7qyD3735Bptn+VTGe567vx+AuACWY23jKTkc8EZrfaZjZwfrZX/uFAjbuvbb0jkZ5s7orlNKVSOdfFIxFeXqvvtyKy6zpyz35SdtQ8zOx24MXuOri7J83scuAxIArc4e6vmdkl2fW3AnOAGcAKMs/2f767ji9SKuLtPYMLxKIdHuxSRKSNjhT7xAcvssW5WwO4+xwyBb3lsltbvHbgsm49qEiJOeNDk3h9w3oaksk269xh6ohRAaQSkbDoSLE/0Mxqs68NqMy+NzK1uH/B0on0EmdMnMRvFy3kn1s209yiOb8yFuPaYz5Keazro2ut3bqVeSuX05hMcvjoPThw+O5d3qeI9Awd6Y2fv31RRLpFRSzOA2edyy3VL3Dfa69S19zMh4YO4z8O/zBHjR3X5f3f+MJz3FKduQOXcicemc/k4cO5/ZQzqIprSmCRsLNCz7QThKlTp3p1dXXQMURKwuMrV3DF3D+1uUVQHo3y8Qn7cv1JJweUTES6k5m95O5Tc61Trx+RkPv5ghdy9gVoSqX40/Jl1OqxPpHQU7EXCbm3a7bkXReLRHlv29YiphGRIBR2Tj0RCdyIvv3Y0tiYc10inWK3Pn2KnEikNDXWN/HE75/mmQfnE43HOPGzR/ORMw4jVuDpZ4uh55+BiLTri1OmcvWTj9OQTOywPB6JcPTYcQys6NLklSKhsGVDDV8+7Cq2bKihsS5za2vR00t44IZH+H9PfY/yyvKAE3aNmvFFQu7UfSfyyX32pSIWI5KdaqJPPM4eAwby4+M/FnA6kdJw85fvYMO7G7cXeoDGbY289eo73PNffwwwWfdQb3yRXmLx+nU8smwp9YkER40dy3Hj9yIW0fd9kebGZk4bdAGJprYdWQEGDOvPA+tuL3KqzmuvN76a8UV6if13G87+uw0POoZIyamrbWh3Gum6LfVFTFMY+lovIiK9Wv8hfSmvyn9PfvS+I4uYpjBU7EVEpFeLRqPMvPK0nAW/vKqcC753dgCpupea8UVEpNc762unsGnNZh65dR7ReAwDUskUn/vBOXzk9MOCjtdl6qAnIiKh09yUYM2K9+gzoIpho4d0+Pc2r6/hlb8uJhqPMeWEA+jTv6qAKbuXOuiJiEivkE6nueuHD/KH62eDQTKRYuzE0XzzN5czfv89dvr7g3YbwLHnfLgISYtL9+xFRCQ0fv2de7j/Jw/TsK2Rhq2NJBoTvLnwLb7yke/w/rsbg44XGBV7EemQhkSCp/75Fk+89aYmz5GSVL+1gT/+75wdBsYBcIfmxgQP/PTRgJIFT834IrJTv3vlH/zo2WeIRjLPIidSKS45ZBr/ftgR7T6fLFJMy19eSTwepbmh7bpkc5IFf/4Hl1x/QfGDlQAVexGhrrmZ3y1ayP2vvUpjKslHxozl3w49jPEDBzHvzeX86Nmn20yTe9vLCxhSVcVnJh/Eys2bWLx+HQMrKjli9Bji0WhAZyK9WXllGel2Op239yx92KnYi/Ry25qbOeO+u1hdW0NjKgXArKVLmLPiDe4642xumP9sm0IP0JBM8rMXnmPem8tZsGYNsYgBRixi3Dzjk3x4zNgin4n0dhMO2ZPyynIatrad5bG8qpwZXzghgFSlQffsRUpYXXMz9y5exLefmMdNLz7Pu1tru/0Ydy58iVW1tdsLPUDKnfpEgm/+ZS5vbtqU93c3NzTw4pp3aUolqUskqEs0U9PUxMWPPMQ7NVu6PatIe6LRKN/49WWUV5XtcHuprLKMPSaO4qQLjgkwXbD0nL30Op58G2+4D5LvQHwiVnk2Fh0WdKw2lr6/gXMfvJ9EOkV9IkE8EiFixnePOY6Z+0/utuMcfecvWV2b+0tERSxG1CLUJZo7tc94JMK5+0/m2mOP746IIp2y/OWV/P4HD7DkuWVU9a9kxhdP4LQvn9zjp6ndGT1nL5KVrn8Aar8HpIAkNP0Nr7sNBt6KlR8RdLzt0u58/uE/UtP0r+bIRDoNwPef/iuHjhzFXoM7PlBIexoTuWf6AoiacdKee/Gn5W/QnE7tsC4CRCIRktlcLSXSaf7x3tpuySfSWROm7Mn3Zn0z6BglRc340mt4cnW20DcBHxS4JvAGfMuluLe9zxeU+atXsa059+NtiVSK3y1a2G3HOmz0GCJ5etRHzLj6qGMZO3AglbF/XRtUxmIMrqyirJ2OeMP79O22jCLSNSr20mt4w4NA26vQ7RofL1qWnXl3ay35brCl3Hlry+ZuO9aXpx1OeY6iXRmLcfmhhzO4qorZMz/DNcccx7SRozhkxEi+ceRRPH7+hfQvz90sWhmL89nJB3VbRhHpGjXjS++RWg0kcq/zZkivL2qc9owfOCjvungkysShu3XbsfYZMpRfn3om33x8Lhvq6ohY5hrg8mmH8YUpmdt/5bEY5+x3AOfsd8AOv3vrx0/lvD/+gWQ6RVMqhQEVsThnfGgSH9lDvfFFSoWKvfQe8f2gcR6QY8QNK4PY3kWPlM8hI0YyvE9f3q7Z0ua54WjE+MzkA7v1eNNGjeav51/EW1s205BIsPfgIZTHdv7xMHn47jx5/oXc9eorvPjuKoZU9eG8Aw7ksFGjNdiOSAlRb3zpNTxdg284Fryu1ZoIREZgw57ArHTubK2ureHcB+9nS2MDjckk5bEY7s6N0z/B8XvuFXQ8ESkx6o0vAlhkAAy6E9/8RSABngSLQmQwNujOkir0AKP7D+CpCy7ib2//k2UbNzC0qg/T95pAvzz3yUVE8lGxl17Fyg6E3Z6Fpmcg/R5Ex0PZYSVX6D8QjUQ4bvyeHDd+z6CjiEgPpmIvvY5ZHCqOCzqGiEjRlObljIiIiHQbFXsREZGQU7EXEREJORV7ERGRkFOxFxERCbnAeuOb2WDgPmAc8E/gbHdvM+C3mf0T2Ep2mrJ8AwaIiIhIbkE+encl8IS7/8jMrsy+/1aebT/q7u8XL5qIfGDt1q3cufBl/r7qbfqXl/PpAw5kxt77EI2oYVCkpwiy2J8KHJt9/RvgKfIXe5EucXfwBrCKkh1ApxQtXr+Ocx+8n+ZUikR2PvtX16/jgSWLuf2UM4ip4Iv0CEH+Sx3u7msBsv/NN42XA/PM7CUzuzjfzszsYjOrNrPqDRs2FCCu9ETuKdLbfoGvn4avPwRfP4V07X+X1Nz1pcrd+fe5j1KXaN5e6AHqEwmq17zLQ0uXBJhORDqjoMXezB43s8U5fk7txG4+7O5TgJOBy8zs6Fwbuftt7j7V3acOGzasW/JLz+c134Btt4LXACnweqi/G9/0ecI4CVR3WrFpE+u2bcu5riGZ5HeLFhY5kYjsqoI247v7CfnWmdk6Mxvh7mvNbASQczJxd1+T/e96M5sFTAOeLkhgCRVProDGx4HWV/FNkHwdmp+H8iODiNYj1DQ1tttMX9Oo1hGRniLIZvzZwAXZ1xcAD7fewMz6mFm/D14DJwGLi5ZQeramp8k8xJGD1+ONfylqnJ5mnyFDaE7l/vOLmjF15KgiJxLp+ZqbEix+dilL5r9BMpEs2nGD7KD3I+B+M7sIeAc4C8DMRgK/cvcZwHBglplBJuvd7j43oLzS41j2J986dS5rT//yCs6atD8PvP4ajckdP5Ti0SiXTJ0WUDKRnmn2L+Zy+7fvznz8OESiES676UJOOC/n3eluFVixd/eNwPE5lq8BZmRfrwQOLHI0CYvyj8LWG3Kvs0qscnpx8/RA/3n0R0mk0zy0dAll0Shph4pYjJ9Nn8Heg4cEHU+kx3ji7me47Zu/p6m+aYflP/vS/zFgSD8OnX5wQY9vYeykNHXqVK+urg46hpSAdM1V0PAnoKHF0gooOwQbdAfZViPZiY319Sxev46+5WUcvPtIIvpzE+kwd+cz4y9l/Tu5h4vZ66Bx3PryT7p8HDN7Kd/Ac5rPXkLN+v8Qj+0Ldb+E9HqwgVD1Wazvl1ToO2FIVRXHjBsfdAyRHqlhWyMb17QZIHa7lYveLngGFXsJNbMI1ucC6HMB7q4CLyJFFy+P0d5HT3llWcEzqIeS9Boq9CLSUel0mtm3PMb5e1/GJ/qcx+c/dAWP/fqvuzQ+R7wszrQZU4hE25bcWDzK8ecd1R2R26ViLyIi0sqPz7+J277xO9auXE9TQzOr31jDzV++nV9ccccu7e/ymy6i/5B+lFXEty8rq4wzeMQgPv/Dc7srdl4q9iIiIi288dKbPPvQi216zjfWNzHnV0+w5s33Or3PYaOH8KvFN3D2N09lj4mjGLf/GD57zdnc9sr1DBjav7ui56V79iJSUjY11PPkWytJpNMcMXoM4wYOCjqS9DJP/+F5Eo2JnOs87Tw760XO+vopnd7vgKH9ueDac7jg2nO6GrHTVOxFpNPS7ty7eBG3vbSAdXXbGN63L5ccMo1z9jugS30jblnwAje++DzRSAR3J+3OiXvuzf876WTi0Wg3noFIfonmJOl07nvzqVSaRHPxRr7rLmrGF5FOu/Lxx7jumad4p7aGplSKd2pq+MHTf+WqJ3Z9COJ5by7n5gXzaUqlqE8kaEgmaUqlePytN7n++b93Y3qR9k07+WAq+lbkXFdWEWfqx3reWG8q9iLSKUvf38Cjy5fR0GoI3YZkkoffeJ0Vmzbu0n5venF+m30CNCaT/H7RKzTlWCdSCAcffwBjJ44iXh7fYXlZRRn7Hbkv+xyyV0DJdp2KvYh0ymNvLieRZ4KcRCrF3BXLd2m/b23JP+iIARvq63ZpvyKdFYlE+MkT3+X4T3+EssoyyqsyPyd/4Ti+//C3go63S3TPXkQ6JZFKk87zrHHaPe9MeTszuLKK+kRNznVJTzOwonKX9iuyKyr7VsNQfIoAAAnISURBVPK12y/l8psvonbjNgYM609Zqyv9nkRX9iLSKUePHUdlPPeHXmU8zjHjxu3Sfi88aAqVsbbXH/FIhI+OG0/fssKPMibSWnllOcNGD+nRhR5U7EWkkw4dOYr9hu1Geave8eXRGAfsNpwpu4/cpf1+ZvJBHDZ6DFUtvkj0iccZ2a8/1x13Ypcyi/R2mvVORDqtIZHgv575Gw8ufQ3I3FM/c+J+XHXUMVTEdv0KyN15dtU7PLR0CY3JJCfutTfT95pAeY4r/lKxuraG9XV1jB0wkCFVVUHHkV6svVnvVOxFZJc1JZNsbmxgUEVlSRfkQni3tpYvz32U1zespywapSmV4sQ99+JHx3+MPrrlIAFor9irGV9Edll5LMbuffv1ukLfkEhw5v138+q692hKpdja3ExzKsVfVr7Jlx59OOh4Im2o2IuIdNLsN5ayLdFMqlXLaHMqxcvvreH19zcElEwkNxV7EZFOeuadf1KfyDN2ujvVa94tciKR9qnYi4h00oDyCiJ55gCIRSL00z17KTEq9iIinXTmxP3aPHr4gWTaOW58zxtOVcJNxV5EpJMO3n0En9znQ1S1eMzQgMpYjGuPOY7+5eXBhRPJoXd1oRUR6QZmxn8ffxJHjx3Prxe+xHvbtvGhoUO5ZOo0DhkxKuh4Im2o2IuI7AIzY8aEfZgxYZ+go4jslJrxRUREQk7FXkREJORU7EVEREJOxV5ERCTkVOxFRERCTsVeREQk5FTsRUREQk7FXkREJOQ0qI6IiEgLb736NtWPvUIkGuHIUw9lxJ7Dg47UZSr2IiIiQDKR5Iczf0r13IWkUinMItxx9d1Mv/A4Lr/pIizPTIc9gZrxRUREgDuvuY/quQtpamgm2Zwi0ZSguTHBvDuf4s+/eiLoeF2iYi8iIr1eKpli9i/m0tTQ3GZdY30T9/xoVgCpuk9gxd7MzjKz18wsbWZT29luupktM7MVZnZlMTOKiEjvsG1LHcnmVN71G1ZtLGKa7hfklf1i4Azg6XwbmFkU+DlwMjAJONfMJhUnnoiI9BZ9BlQRiea/Jz9wt/5FTNP9Aiv27v66uy/byWbTgBXuvtLdm4F7gVMLn05ERHqTWDzGCZ89hrKKeJt15VXlnPGVTwSQqvuU+j37UcCqFu9XZ5e1YWYXm1m1mVVv2LChKOFERCQ8vnT9+ex54Dgq+1YAYGZU9ClnygkHcOZXPh5wuq4p6KN3ZvY4sHuOVVe7+8Md2UWOZZ5rQ3e/DbgNYOrUqTm3ERERyaeyTwX/++wPefnxV5n/yAKi8RhHf+oIJh2xT49+7A4KXOzd/YQu7mI1MKbF+9HAmi7uU0REJKdIJMLUkw5k6kkHBh2lW5V6M/4CYIKZjTezMmAmMDvgTCIiIj1KkI/enW5mq4EjgD+Z2WPZ5SPNbA6AuyeBy4HHgNeB+939taAyi4iI9ESBDZfr7rOANqMUuPsaYEaL93OAOUWMJiIiEiql3owvIiIiXaRiLyIiEnIq9iIiIiFn7uF7JN3MNgBvB52ji4YC7wcdooh62/lC7zvn3na+0PvOubedL5TWOY9192G5VoSy2IeBmVW7e94JgsKmt50v9L5z7m3nC73vnHvb+ULPOWc144uIiIScir2IiEjIqdiXrtuCDlBkve18ofedc287X+h959zbzhd6yDnrnr2IiEjI6cpeREQk5FTsRUREQk7FvkSZ2Q/MbJGZLTSzeWY2MuhMhWZmPzGzpdnznmVmA4POVEhmdpaZvWZmaTMr+Ud3usLMppvZMjNbYWZXBp2n0MzsDjNbb2aLg85SDGY2xsz+amavZ/9OXxF0pkIzswoze9HMXsme8/eCztQe3bMvUWbW391rs6//HZjk7pcEHKugzOwk4El3T5rZjwHc/VsBxyoYM5sIpIH/A77u7tUBRyoIM4sCbwAnAqvJTF19rrsvCTRYAZnZ0cA24Lfuvn/QeQrNzEYAI9z9ZTPrB7wEnBby/8cG9HH3bWYWB/4OXOHu8wOOlpOu7EvUB4U+qw8Q+m9l7j4vO60xwHxgdJB5Cs3dX3f3ZUHnKIJpwAp3X+nuzcC9wKkBZyood38a2BR0jmJx97Xu/nL29VYyU5KPCjZVYXnGtuzbePanZD+nVexLmJldZ2argPOAa4LOU2QXAn8OOoR0i1HAqhbvVxPyQtCbmdk44GDghWCTFJ6ZRc1sIbAe+Iu7l+w5q9gHyMweN7PFOX5OBXD3q919DHAXcHmwabvHzs45u83VQJLMefdoHTnfXsByLCvZKyDZdWbWF3gQ+Eqr1slQcveUux9EphVympmV7C2bWNABejN3P6GDm94N/An4bgHjFMXOztnMLgA+ARzvIehQ0on/x2G2GhjT4v1oYE1AWaRAsvetHwTucvc/Bp2nmNx9i5k9BUwHSrJTpq7sS5SZTWjx9hRgaVBZisXMpgPfAk5x9/qg80i3WQBMMLPxZlYGzARmB5xJulG2s9rtwOvufkPQeYrBzIZ98MSQmVUCJ1DCn9PqjV+izOxBYF8yvbXfBi5x93eDTVVYZrYCKAc2ZhfND/MTCGZ2OnATMAzYAix0948Fm6owzGwG8DMgCtzh7tcFHKmgzOwe4Fgy05+uA77r7rcHGqqAzOwjwDPAq2Q+swCucvc5waUqLDObDPyGzN/pCHC/u38/2FT5qdiLiIiEnJrxRUREQk7FXkREJORU7EVEREJOxV5ERCTkVOxFRERCTsVeREQk5FTsRaQNM0tlp1debGZ/MLOq7PLdzexeM3vTzJaY2Rwz26fF733VzBrNbEAHjvHt7JS3y8wslOMLiJQKFXsRyaXB3Q/KTs/aDFySHSVtFvCUu+/l7pOAq4DhLX7vXDIj5p3e3s7NbBKZkfT2IzPE6C+yU+GKSAGo2IvIzjwD7A18FEi4+60frHD3he7+DICZ7QX0Bb5Dpui351TgXndvcve3gBVkpsIVkQJQsReRvMwsBpxMZhjU/YGX2tn8XOAeMl8O9jWz3drZVtPeihSRir2I5FKZnae7GniHzCQnOzOTzNV6GvgjcFY722raW5Ei0hS3IpJLQ3ae7u3M7DXgU7k2zk4KMgH4S+bWPmXASuDnefavaW9FikhX9iLSUU8C5Wb2xQ8WmNmhZnYMmSb8a919XPZnJDDKzMbm2ddsYKaZlZvZeDJfFF4s9AmI9FYq9iLSIZ6ZIvN04MTso3evAdeSuSKfSaanfkuzsstz7es14H5gCTAXuMzdUwWKLtLraYpbERGRkNOVvYiISMipg56IFEx2ZLwft1r8lru3O+iOiHQvNeOLiIiEnJrxRUREQk7FXkREJORU7EVEREJOxV5ERCTk/j/ABTVMaQ4QVQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 576x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(figsize=(8, 4))\n",
    "df_test.scatter(df_test.PCA_0, df_test.PCA_1, c_expr=df_test.class_, s=50)\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
